智能论文笔记

Deep Structural Causal Shape Models

Rajat Rasal , Daniel C. Castro , Nick Pawlowski , Ben Glocker

分类：计算机视觉 | 机器学习

2022-08-23

因果推理提供了一种语言，以提出纯粹统计关联以外的重要介入和反事实问题。例如，在医学成像中，我们可能希望研究遗传，环境或生活方式因素对解剖表型正常和病理变异的因果关系。但是，尽管可以可靠地构建从自动图像分割中提取的3D表面网格的解剖形状模型，但缺乏计算工具来实现有关形态变化的因果推理。为了解决这个问题，我们提出了深层结构性因果形状模型（CSM），该模型利用了高质量的网格生成技术，从几何深度学习，在深层结构性因果模型的表达框架内。 CSM可以通过反事实网格产生来实现特定于受试者的预后（“如果患者大十岁，该患者的大脑结构将如何变化？”），这与大多数当前有关纯粹人口级统计形状建模的作品形成鲜明对比。我们通过许多定性和定量实验利用了3D脑结构的大数据集，证明了Pearl因果关系层次结构的所有级别CSM的能力。

translated by 谷歌翻译

Do DALL-E and Flamingo Understand Each Other?

Hang Li , Jindong Gu , Rajat Koner , Sahand Sharifzadeh , Volker Tresp

分类：计算机视觉 | 机器学习

2022-12-23

A major goal of multimodal research is to improve machine understanding of images and text. Tasks include image captioning, text-to-image generation, and vision-language representation learning. So far, research has focused on the relationships between images and text. For example, captioning models attempt to understand the semantics of images which are then transformed into text. An important question is: which annotation reflects best a deep understanding of image content? Similarly, given a text, what is the best image that can present the semantics of the text? In this work, we argue that the best text or caption for a given image is the text which would generate the image which is the most similar to that image. Likewise, the best image for a given text is the image that results in the caption which is best aligned with the original text. To this end, we propose a unified framework that includes both a text-to-image generative model and an image-to-text generative model. Extensive experiments validate our approach.

translated by 谷歌翻译

A Review of Speech-centric Trustworthy Machine Learning: Privacy, Safety, and Fairness

Tiantian Feng , Rajat Hebbar , Nicholas Mehlman , Xuan Shi , Aditya Kommineni , and Shrikanth Narayanan

分类：机器学习

2022-12-18

Speech-centric machine learning systems have revolutionized many leading domains ranging from transportation and healthcare to education and defense, profoundly changing how people live, work, and interact with each other. However, recent studies have demonstrated that many speech-centric ML systems may need to be considered more trustworthy for broader deployment. Specifically, concerns over privacy breaches, discriminating performance, and vulnerability to adversarial attacks have all been discovered in ML research fields. In order to address the above challenges and risks, a significant number of efforts have been made to ensure these ML systems are trustworthy, especially private, safe, and fair. In this paper, we conduct the first comprehensive survey on speech-centric trustworthy ML topics related to privacy, safety, and fairness. In addition to serving as a summary report for the research community, we point out several promising future research directions to inspire the researchers who wish to explore further in this area.

translated by 谷歌翻译

Spatio-Temporal Super-Resolution of Dynamical Systems using Physics-Informed Deep-Learning

Rajat Arora , Ankit Shrivastava

分类：机器学习

2022-12-08

This work presents a physics-informed deep learning-based super-resolution framework to enhance the spatio-temporal resolution of the solution of time-dependent partial differential equations (PDE). Prior works on deep learning-based super-resolution models have shown promise in accelerating engineering design by reducing the computational expense of traditional numerical schemes. However, these models heavily rely on the availability of high-resolution (HR) labeled data needed during training. In this work, we propose a physics-informed deep learning-based framework to enhance the spatial and temporal resolution of coarse-scale (both in space and time) PDE solutions without requiring any HR data. The framework consists of two trainable modules independently super-resolving the PDE solution, first in spatial and then in temporal direction. The physics based losses are implemented in a novel way to ensure tight coupling between the spatio-temporally refined outputs at different times and improve framework accuracy. We analyze the capability of the developed framework by investigating its performance on an elastodynamics problem. It is observed that the proposed framework can successfully super-resolve (both in space and time) the low-resolution PDE solutions while satisfying physics-based constraints and yielding high accuracy. Furthermore, the analysis and obtained speed-up show that the proposed framework is well-suited for integration with traditional numerical methods to reduce computational complexity during engineering design.

translated by 谷歌翻译

MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

Yongqin Wang , Rachit Rajat , Murali Annavaram

分类：机器学习

2022-09-27

在过去的几年中，多方计算（MPC）作为安全计算模型一直在越来越受欢迎，尤其是对于机器学习（ML）推断。与竞争对手相比，MPC的开销少于同构加密（HE），并且比基于硬件的可信执行环境（TEE）（例如Intel SGX）具有更强的威胁模型。尽管具有明显的优势，但在应用于ML算法时，MPC协议仍然与针对性相比，仍要支付大量的绩效罚款。开销是由于增加的计算和通信成本。对于在ML算法中无处不在的乘法，MPC协议在MPC服务器之间增加了32x更多的计算成本和1轮广播。此外，由于SoftMax，Relu和其他非线性操作，其具有微不足道的成本的ML计算由于增加了沟通而变得非常昂贵。这些添加的开销使MPC不太适合在实时ML推理框架（例如语音翻译）中部署。在这项工作中，我们提出了MPC-Pipe，这是一种使用两种ML特异性方法的MPC管道推理技术。 1）内线间管道和2）内层管道。这两种技术缩短了机器学习模型的总推理运行时。与当前的MPC协议实现相比，当模型权重公开时，我们的实验已显示可将ML推断潜伏期降低多达12.6％，而在模型权重公开时，将ML推断潜伏期最高12.6％。

translated by 谷歌翻译

Leveraging Large Language Models for Robot 3D Scene Understanding

William Chen , Siyi Hu , Rajat Talak , Luca Carlone

分类：机器人 | 自然语言处理 | 计算机视觉 | 机器学习

2022-09-12

语义3D场景理解是机器人技术至关重要的问题。尽管在空间感知方面已经取得了重大进展，但机器人仍然远非对普通人的家庭对象和位置具有常识性知识。因此，我们研究了大型语言模型来传授常识以进行场景理解。具体来说，我们介绍了三个范式，用于利用语言根据其包含的对象在室内环境中分类房间：（i）零摄像的方法，（ii）馈送前向分类器方法，以及（iii）对比分类器方法。这些方法在现代空间感知系统产生的3D场景图上运行。然后，我们分析了每种方法，证明了由于使用语言而引起的显着零拍概括和传递功能。最后，我们表明这些方法还适用于从包含房间中推断建筑标签，并在真实环境中演示我们的零弹方法。所有代码均可在https://github.com/mit-spark/llm_scene_understanding上找到。

translated by 谷歌翻译

Machine Learning For Classification Of Antithetical Emotional States

Jeevanshi Sharma , Rajat Maheshwari , Yusuf Uzzaman Khan

分类：机器学习

2022-09-06

通过脑电图信号的情绪分类取得了许多进步。但是，诸如缺乏数据和学习重要特征和模式之类的问题始终是具有在计算和预测准确性方面改进的领域。这项工作分析了基线机器学习分类器在DEAP数据集上的性能以及一种表格学习方法，该方法提供了最新的可比结果，从而利用了性能提升，这是由于其深度学习架构而无需部署重型神经网络。

translated by 谷歌翻译

InstanceFormer: An Online Video Instance Segmentation Framework

Rajat Koner , Tanveer Hannan , Suprosanna Shit , Sahand Sharifzadeh , Matthias Schubert , Thomas Seidl , Volker Tresp

分类：计算机视觉

2022-08-22

最近的基于变压器的离线视频实例细分（VIS）方法取得了令人鼓舞的结果，并明显胜过在线方法。但是，它们对整个视频的依赖以及由全时空的注意力引起的巨大计算复杂性限制了它们在现实生活中的应用中，例如处理冗长的视频。在本文中，我们提出了一个基于单级变压器的高效在线VIS框架，名为InstanceFormer，该框架特别适合长期挑战性的视频。我们提出了三个新的组件来建模短期和长期依赖性和时间连贯性。首先，我们传播了对短期更改建模的先前实例的表示形式，位置和语义信息。其次，我们在解码器中提出了一种新颖的记忆交叉注意，该记忆使网络可以在某个时间窗口内研究早期实例。最后，我们采用时间对比度损失，在所有框架的实例表示中施加连贯性。记忆注意力和时间连贯性特别有益于远程依赖建模，包括诸如遮挡等挑战的情况。所提出的实例形式优于以前的在线基准方法在多个数据集上的较大边距。最重要的是，InstanceFormer超过了挑战和长数据集（例如YouTube-Vis-2021和OVIS）的离线方法。代码可从https://github.com/rajatkoner08/instanceformer获得。

translated by 谷歌翻译

Persuasion Strategies in Advertisements: Dataset, Modeling, and Baselines

Yaman Kumar Singla , Rajat Jha , Arunim Gupta , Milan Aggarwal , Aditya Garg , Ayush Bhardwaj , Tushar , Balaji Krishnamurthy , Rajiv Ratn Shah , Changyou Chen

分类：自然语言处理 | 计算机视觉

2022-08-20

建模是什么使广告有说服力的原因，即引起消费者的所需响应，对于宣传，社会心理学和营销的研究至关重要。尽管其重要性，但计算机视觉中说服力的计算建模仍处于起步阶段，这主要是由于缺乏可以提供与ADS相关的说服力标签的基准数据集。由社会心理学和市场营销中的说服文学的激励，我们引入了广泛的说服策略词汇，并建立了用说服策略注释的第一个AD图像语料库。然后，我们通过多模式学习制定说服策略预测的任务，在该任务中，我们设计了一个多任务注意融合模型，该模型可以利用其他广告理解的任务来预测说服策略。此外，我们对30家财富500家公司的1600个广告活动进行了真实的案例研究，我们使用模型的预测来分析哪些策略与不同的人口统计学（年龄和性别）一起使用。该数据集还提供图像分割掩码，该蒙版在测试拆分上标记了相应的AD图像中的说服力策略。我们公开发布代码和数据集https://midas-research.github.io/persuasion-avertisements/。

translated by 谷歌翻译

fMRI-S4: learning short- and long-range dynamic fMRI dependencies using 1D Convolutions and State Space Models

Ahmed El-Gazzar , Rajat Mani Thomas , Guido Van Wingen

分类：机器学习

2022-08-08

静息状态脑功能活性对非成像表型的单个主体映射是神经影像学的主要目标。当今应用的绝大多数学习方法都取决于静态表示或短期时间相关性。这与动态性的大脑活动性质不符，并且表现出短期和长期依赖性。此外，在单个任务/数据集上已经开发并验证了新的复杂的深度学习方法。这些模型在研究不同目标的研究中的应用通常需要详尽的超参数搜索，模型工程以及反复试验，以通过更简单的线性模型获得竞争结果。反过来，这限制了他们在快速发展的研究领域中的采用和阻碍公平的基准测试。为此，我们提出了fMRI-S4；一种用于分类表型和精神疾病的多功能深度学习模型，该模型来自静止状态功能磁共振成像扫描时间的时间。 fMRI-S4使用1D卷积和最近引入的状态空间模型S4捕获信号中的短距离和长范围时间依赖性。所提出的体系结构在任务/数据集中具有轻巧，样本效率且健壮。我们在三个多站点RS-FMRI数据集上验证了fMRI-S4诊断重大抑郁症（MDD），自闭症谱系障碍（ASD）和性别分类的任务。我们证明fMRI-S4可以在所有三个任务上均优于现有方法，并且可以作为插件和游戏模型进行培训，而无需针对每种设置进行特殊的超散件调整

translated by 谷歌翻译